Learning-based concept-hierarchy refinement through exploiting topology, content and social information

نویسندگان

  • Tsung-Ting Kuo
  • Shou-De Lin
چکیده

Concept hierarchies, such as the ACM Computing Classification Scheme and InterPro Protein Sequence Classification, are widely used in categorization and indexing applications. In the Internet and Web 2.0 era, new concepts and terms are emerging on an almost daily basis, so it is essential that such hierarchies maintain up-to-date records of concepts. This paper proposes a mechanism to identify the most suitable position to insert new terms into an existing concept hierarchy. The problem is challenging because there are hundreds or even thousands of candidate positions for insertion. Furthermore, usually there is no training instance available for an insertion; nor is it practical to assume the availability of a detailed description of the target concept, except in the hierarchy itself. To resolve the problem, we exploit the topology, content and social information, and apply a learning approach to identify the underlying construction criteria of the concept hierarchy. We utilize three metrics (namely, accuracy, taxonomic closeness, and ranking) to evaluate the proposed learning-based approach on the ACM CCS, the DOAJ and the InterPro datasets to evaluate the proposed learning-based approach. The results demonstrate that, in all three metrics, our approach outperforms similarity-based approaches, such as the Normalized Google Distance, by a significant margin. Finally, we propose a level-based recommendation scheme as a novel application of our system. The source code, dataset, and other related resources are available at http://www.csie.ntu.edu.tw/~d97944007/ refinement/. 2011 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Who Will Follow Whom? Exploiting Semantics for Link Prediction in Attention-Information Networks

Existing approaches for link prediction, in the domain of network science, exploit a network’s topology to predict future connections by assessing existing edges and connections, and inducing links given the presence of mutual nodes. Despite the rise in popularity of Attention-Information Networks (i.e. microblogging platforms) and the production of content within such platforms, no existing wo...

متن کامل

Induction on the Semantic Web

The Semantic Web is increasingly populated with instance data, nowadays often in the form of Linked Data. Consequently, machine learning and other instance driven approaches are of increasing relevance. In this special issue we have collected various inductive approaches and approaches from relational learning for solving a number of tasks. In particular, inductive methods are applied to learn ...

متن کامل

Building a Concept Hierarchy Using Frequent Tag Sequences

Web sites that allow collaborative tagging of resources have become a commonplace development. As part of the second generation of applications available on the Web, these sites provide a tremendous amount of user-generated taxonomic information. However, information seekers are hindered by the lack of organization within these tags. To address this issue, several methods have been proposed for...

متن کامل

Concept Detector Refinement on Social Videos

The explosion of the social video sharing sites gives new challenges on video search and indexing technique. Because of the concept diversity in social videos, it is very hard to build a well annotated dataset that provides good coverage over the whole meaning of concepts. However, the prosperity of social video also make it easy to obtain a huge number of videos, which gives an opportunity to ...

متن کامل

Learning Concept Hierarchies through Probabilistic Topic Modeling

With the advent of semantic web, various tools and techniques have been introduced for presenting and organizing knowledge. Concept hierarchies are one such technique which gained significant attention due to its usefulness in creating domain ontologies that are considered as an integral part of semantic web. Automated concept hierarchy learning algorithms focus on extracting relevant concepts ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 181  شماره 

صفحات  -

تاریخ انتشار 2011